AITopics | expert preference

Collaborating Authors

expert preference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Death of the Novel(ty): Beyond n-Gram Novelty as a Metric for Textual Creativity

Saakyan, Arkadiy, Kim, Najoung, Muresan, Smaranda, Chakrabarty, Tuhin

arXiv.org Artificial IntelligenceSep-29-2025

N-gram novelty is widely used to evaluate language models' ability to generate text outside of their training data. More recently, it has also been adopted as a metric for measuring textual creativity. However, theoretical work on creativity suggests that this approach may be inadequate, as it does not account for creativity's dual nature: novelty (how original the text is) and appropriateness (how sensical and pragmatic it is). We investigate the relationship between this notion of creativity and n-gram novelty through 7542 expert writer annotations (n=26) of novelty, pragmaticality, and sensicality via close reading of human and AI-generated text. We find that while n-gram novelty is positively associated with expert writer-judged creativity, ~91% of top-quartile expressions by n-gram novelty are not judged as creative, cautioning against relying on n-gram novelty alone. Furthermore, unlike human-written text, higher n-gram novelty in open-source LLMs correlates with lower pragmaticality. In an exploratory study with frontier close-source models, we additionally confirm that they are less likely to produce creative expressions than humans. Using our dataset, we test whether zero-shot, few-shot, and finetuned models are able to identify creative expressions (a positive aspect of writing) and non-pragmatic ones (a negative aspect). Overall, frontier LLMs exhibit performance much higher than random but leave room for improvement, especially struggling to identify non-pragmatic expressions. We further find that LLM-as-a-Judge novelty scores from the best-performing model were predictive of expert writer preferences.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.22641

Country:

North America > United States (1.00)
Asia (1.00)
Europe > United Kingdom > England (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.95)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ELO-Rated Sequence Rewards: Advancing Reinforcement Learning Models

Ju, Qi, Hei, Falin, Fang, Zhemei, Luo, Yunfeng

arXiv.org Artificial IntelligenceSep-5-2024

Reinforcement Learning (RL) is highly dependent on the meticulous design of the reward function. However, accurately assigning rewards to each state-action pair in Long-Term RL (LTRL) challenges is formidable. Consequently, RL agents are predominantly trained with expert guidance. Drawing on the principles of ordinal utility theory from economics, we propose a novel reward estimation algorithm: ELO-Rating based RL (ERRL). This approach is distinguished by two main features. Firstly, it leverages expert preferences over trajectories instead of cardinal rewards (utilities) to compute the ELO rating of each trajectory as its reward. Secondly, a new reward redistribution algorithm is introduced to mitigate training volatility in the absence of a fixed anchor reward. Our method demonstrates superior performance over several leading baselines in long-term scenarios (extending up to 5000 steps), where conventional RL algorithms falter. Furthermore, we conduct a thorough analysis of how expert preferences affect the outcomes.

agent, algorithm, trajectory, (14 more...)

arXiv.org Artificial Intelligence

2409.03301

Country:

North America > United States > Texas (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment > Sports (0.94)
Leisure & Entertainment > Games > Chess (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

GREEN: Generative Radiology Report Evaluation and Error Notation

Ostmeier, Sophie, Xu, Justin, Chen, Zhihong, Varma, Maya, Blankemeier, Louis, Bluethgen, Christian, Michalson, Arne Edward, Moseley, Michael, Langlotz, Curtis, Chaudhari, Akshay S, Delbrouck, Jean-Benoit

arXiv.org Artificial IntelligenceMay-6-2024

Machine learning has enabled great progress in the automatic interpretation of images, where vision language models (VLMs) translate features of images into text (Radford et al., 2021; Liu et al., 2024). In the medical domain, patient images are interpreted by radiologists, Evaluating radiology reports is a challenging which is referred to as radiology report generation problem as factual correctness is extremely important (RRG). Automated and high-quality RRG has due to the need for accurate medical the potential to greatly reduce the repetitive work of communication about medical images. Existing radiologists, alleviate burdens arising from shortage automatic evaluation metrics either suffer of radiologists, generally improve clinical communication from failing to consider factual correctness (Kahn Jr et al., 2009), and increase the accuracy (e.g., BLEU and ROUGE) or are limited of radiology reports (Rajpurkar and Lungren, 2023). in their interpretability (e.g., F1CheXpert Commonly used evaluation metrics in RRG literature and F1RadGraph). In this paper, we introduce (Lin, 2004; Zhang et al., 2019; Smit et al., 2020; GREEN (Generative Radiology Report Evaluation Delbrouck et al., 2022) seek to evaluate a generated and Error Notation), a radiology report radiology report against a reference report written by generation metric that leverages the natural language a radiologist by leveraging simple n-grams overlap, understanding of language models to general language similarity, pathology identification identify and explain clinically significant errors within specific imaging modalities and disease classes, in candidate reports, both quantitatively and commercially-available large language models.

dataset, error count, radiology report, (14 more...)

arXiv.org Artificial Intelligence

2405.03595

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre: Research Report > Experimental Study (0.69)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Connectionist Learning of Expert Preferences by Comparison Training

Neural Information Processing SystemsApr-6-2023, 19:59:00 GMT

A new training paradigm, caned the "eomparison pa.radigm," is introduced for tasks in which a. network must learn to choose a prdcrred pattern from a set of n alternatives, based on examplcs of Imma.n expert prderences. In this pa.radigm, the inpu t to the network consists of t.wo uf the n alterna tives, and the trained output is the expert's judgement of which pa.ttern is better. This para.digm is applied to the lea,rning of hackgammon, a difficult board ga.me in wllieh the expert selects a move from a. set, of legal mm·es. Furthermorf', it is possible to set up the network so tha.t it always produces consisten t rank-orderings .

comparison training, connectionist learning, expert preference, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

In this pa per, 'vc considcr problem domains in which tlte expert is givcn a s('t of 71.

artificial intelligence, connectionist learning, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback